Compressed Pattern Matching for SEQUITUR

نویسندگان

  • Shuichi Mitarai
  • Masahiro Hirao
  • Tetsuya Matsumoto
  • Ayumi Shinohara
  • Masayuki Takeda
  • Setsuo Arikawa
چکیده

Sequitur due to Nevill-Manning and Witten. [18] is a powerful program to infer a phrase hierarchy from the input text, that also provides extremely effective compression of large quantities of semi-structured text [17]. In this paper, we address the problem of searching in Sequitur compressed text directly. We show a compressed pattern matching algorithm that finds a pattern in compressed text without explicit decompression. We show that our algorithm is approximately 1.27 times faster than a decompression followed by an ordinal search.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collage system: a unifying framework for compressed pattern matching

We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D. The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framew...

متن کامل

Pattern - Matching Problems for

The power of weighted nite automata to describe very complex images was widely studied, see [5, 6, 7]. Finite automata can be also used as an e ective tool for compression of twodimensional images. There are some software packages using this type of compression, see [12, 6]. We consider the complexity of some pattern-matching problems for two-dimensional images which are highly compressed using...

متن کامل

The Complexity of Two-dimensional Compressed Pattern Matching

We study computational complexity of two-dimensional compressed pattern matching problems. Among other things, we design an eecient randomized algorithm for the equality problem of two compressed two-dimensional patterns as well as prove computational hardness of the general two-dimensional compressed pattern matching .

متن کامل

The Complexity of Two - DimensionalCompressed Pattern -

We consider the complexity of problems for highly compressed 2-dimensional texts: compressed pattern-matching (when the pattern is not compressed and the text is compressed) and fully compressed pattern-matching (when also the pattern is compressed). First we consider 2-dimensional compression in terms of straight-line programs, see 9]. It is a natural way for representing very highly compresse...

متن کامل

A New Compression Method for Compressed Matching

A practical adaptive compression algorithm based on LZSS is presented, which is especially constructed to solve the compressed pattern matching problem, i.e., pattern matching directly in a compressed text without decompressing.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001